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Abstract 

The problem of fast computation of multivariate kernel density estimation (KDE) 
is still an open research problem. In our view, the existing solutions do not resolve this 
matter in a satisfactory way. One of the most elegant and efficient approach utilizes 
the fast Fourier transform. Unfortunately, the existing FFT-based solution suffers 
from a serious limitation, as it can accurately operate only with the constrained (i.e., 
diagonal) multivariate bandwidth matrices. In this paper we describe the problem 
and give a satisfactory solution. The proposed solution may be successfully used 
also in other research problems, for example for the fast computation of the optimal 
bandwidth for KDF. 
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1 Introduction 


Kernel density estimation is o ne of the most important statis t ical to o l with many practica l 
applications, see for example Kulczvcki and Charvtanowicz ( 2010 1: Ischauer et al. ( 2013 1 
and many others. It has been applied successfully to both univariate and multivariate 
problems. There exist s exten s ive lit e rature on this issue, inc ludi ng several classica l mono¬ 


graphs, see 


SilvermanI ( 1998 1. 


Scott 


( I 992 I) . ISimonoffl ( 199611 and I Wand and JonesI ( 19951) . 


A general form of the d-dimentional multivariate kernel density estimator is 


f{x,H) = -y^KH{x-Xi 
n 


2 = 1 


Kh[u) = 


u 


( 1 ) 


where H is the symmetric and positive dehnite dxd matrix (called bandwidth or smoothing 
matrix), d is the problem dimensionality, x = {xi,X 2 , - ■ ■ , x^)^, Xj = {Xu, Xi 2 , • • • , Xj^)^, 
z = 1, 2, • • • , n is a sequence of independent identically distributed (iid) d-variate sample 
drawn from some distribution with an unknown density /, K and Kh are the unsealed and 
scaled kernels, respectively. In most cases the kernel has the form of a standard multivariate 
normal density. 

There are two main computational problems related to KDE: (a) fast evaluation of the 
kernel density estimate f{x,H), (b) fast estimation of the optimal bandwidth matrix H 
(or scalar h in the univariate case). In the paper we concentrate on the hrst problem. 

It is obvious from Eqn. ([1]) that the naive direct evaluation of the KDE at m evaluation 
points for n data points requires 0{mn) kernel evaluations. Evaluation points can be of 
course the same as data points and then the computational complexity is 0{n^) making it 
very expensive, especially for large datasets and higher dimensions. 


A number of met 
ample 


Raykar et al 


rods have been proposed to accelerate the computations. See for ex- 


(120101) for an interesting review of the methods. O ther techniques, like 


for ex ample usage of Graphics Processing Units (GPUs) are also used (1 Andrzejewski et al. 


2 OI 3 I) . One of the most elegant and effective methods is based on using the fast Fourier 
tr ansform (EFT). A preliminary work on using EFT to kernel density estimation was given 


m 


SilvermanI (119821 ) (only for univariate case). 
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In the paper we a re concerned with an FF T-based method that was originally described 


Wandl fll9941) . In 


Wand and Jones 


f 19951. 


appendix D) an illustrative toy example has 


been presented. The method works very well for univariate case but, unfortunately, its 
multivariate extension does not support unconstrained bandwidth matrices. From now on 
this method will be called Wand’s algorithm. The FFT-based method investigated in this 
paper can be easily adapted also in other algorithms, for example for the fast computation 
of the optimal bandwidth for K DE. An appropriate research pa per is in preparation and 


its draft version can be found in 


Gramacki and Gramackil (120151) . 


The remainder of the paper is organized as follows: in Section |2] we briefly present 
the FFT-based algorithm and indicate its limitations, in Section [3] we demonstrate the 
problem. In Section H] we identify the source of the problem and propose a satisfactory 
solution. In Section E] we conclude the paper. 


2 FFT-based algorithm for KDE 


Below we briefly present Wand’s algorithm. It consists of 3 bas ic step s . In 


the multivariate linear binning (a kind of data discretization, see IWandl (119941) 1 of the input 


he first step 


random variables X* is required. The binning approximation of Eqn. ([T]) is 


. Ml Mi 

hoi, = (a - 9‘) Q (2) 

^ 1=1 ld = ^ 

where g are equally spaced grid points and c are grid counts. Grid counts are obtained by 
assigning certain weights to the grid points, based on neighbouring observations. In other 
words, each grid point is accompanied by a corresponding grid count. 

The following notation is used: for A: = 1,..., d, let gki < ■ ■ ■ < gkM^ be an equally 
spaced grid in the A:th coordinate directions such that [gki^Qku^ contains the A:th coordinate 
grid points. Here Mk is a positive integer representing the grid size in direction k. Let 


9j = {giji,■■■,gdji), 1 < jfc < Mfc, k = i,...,d 


(3) 


denote the grid point indexed by j = (ji,... ,jd) and the A:th binwidth be denoted by 

gkMk ~ gki 


5k — 


Mk 


(4) 
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( 5 ) 


In the second step Eqn. ([2]) is rewritten so that it takes a form of the convolution 

Mi-l Md-1 

q=-(Mi-l) ld=-{Md-l) 


where 


ki =—Kni^ihr " i^dld)- ( 6 ) 

n 

In the third step we compute the convolution between Cj_i and ki using the FFT 
algorithm in only 0{Mi log Mi... Md log Md) operations compared to the 0{Mf ... Mj) 
operations required for direct computation of Fqn. ([2]). 

In practical implementations of Wand’s algorithm, the sum limits in {Mi, • • • , Md} can 
be additionally shrunk to some smaller values (Li,--- ,Ld}, which signihcantly reduces 
the computational burden. In most cases, the kernel K is the multivariate normal density 
function and, as such, an effective support can be dehned, i.e., the region outside which the 
values of K are practically negligible. Our proposed formula for calculating Lk, k = 1, ■ ■ ■ ,d 
is given in Section 01 Now Eqn. ([5]) can be hnally rewritten as 


Li Ld 

■■■ 

h=—Li ld=—Ld 


(7) 


Although the above presented 3-step algorithm is very fast and accurate it suffers from 
a serious limitation. It supports only a small subset of all possible mult ivariate kernels of 
intere st. Two commonly used kernel types are product and radial ones fjWand and Jonesl . 


19951) . The problem reveals if the radial kernel is used and the bandwidth matrix H is 


unconstrained^ that is H E iF, where dF denotes the class of symmetric, positive dehnite 
d X d matrices. If, however, the bandwidth matrix belongs to a more restricted constrained 
{diagonal) form (that is H eF) the problem doesn’t manifest itself. 


(1994 

) and in the kdejks} R function ( 

Duone. 

291511 


Wand and Jones 

(1995 

), 

Wand 


. Moreover, many authors cite the 


FFT-based algorithm for KDE mechanically, without any qualihcation or mentioning its 
greatest limitation. 


^Starting from version 1.10.0 of the ks package, the FFT-based solution presented in this paper was 
successfully implemented there. 
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3 Problem demonstration 


In Figure [T] we demonstrate the problem mentioned in Section [2l A sample unicef dataset 
from the ks R package was used. For simplicity only 2D examples are shown but extension 
for higher dimensions is not difficult. For better readability, the authors’ own R codes 
were used and they are provide d as suppleni ental material^. Wand’s algorithm is im- 
plemented in the ks R p ackage (iDuond. 120151) . as well as in the KernSmooth R package 
(IWand and Ripleyi . 120151 ). However, the KernSmooth implementation supports only prod¬ 
uct kernels. The standard density!stats} function uses FFT to compute univariate 
kernel density estimates only. 


FFT not used, H unconstrained FFT used, H unconstrained 


H diagonal 





Figure 1: Density estimations for the sample unicef dataset with and without using FFT, 
for both unconstrained and constrained bandwidth matrices. Description of each plot is 
given in the text. 

The density estimation depicted in Figure [I^a) can be treated as the reference. It was 
calculated directly according Eqn. ([2]). The bandwidth H was unconstrained and was cal¬ 
culated using the Hpijks} R function. In Figure [Hb) the density estimation was calculated 
using Wand’s algorithm. The bandwidth H was exactly the same as in Figure [T}a). It 
is easy to notice that the result is obviously inaccurate, as the results in Figures [T](a) and 

^During experiments a small bug in binningfks} R function was found. According to binning definition, 
grid counts entries in ci must sum to n. A quick experiment with binningfksj shows that it returns wrong 
results, while the authors’ version returns the correct ones. The corrected version of the binning function 
is also included in the supplemental materials. 
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[Hb) should be the same. The density estimation depicted in Figure [^c) is for the diagonal 
bandwidth H (calculated using the Hpi.diagjks} R function). The KDE is exactly the 
same, regardless of using Wand’s algorithm or direct calculations. It is important to see 
that Figures[T](b) and[T](c) are very similar. This similarity suggests that Wand’s algorithm 
lose in some way most (or even all) the information carried by off-diagonal entries of the 
bandwidth matrix H. In other words Wand’s algorithm (in it’s current form) is adequate 
only for constrained bandwidths. 


4 Problem identification and its solution 

4.1 The current form of the algorithm 

To compute the convolution ((51) (or optionally ((71)) of two vectors the discrete convolution 
theorem is used. However, this theorem requires two main assnmptions abont the two input 
vectors, that is c and k. These wectors in signal processing’s terminology are caled input 
signal and impulse response, respectively. The hrst assumption states that the two vectors 
must have the same length and the second assumption requires that the input signal be 
treated as a periodic one. The consequence of the above is that the so called zero-padding 
and wrap-a r ound ordering procedur es are r equire d. Details can be fonnd for example in 


Press et al 


fll992 


Chapter 13). In IWandl (ll994l) the author suggests reshaping c and k 
as in dH]) and ((9]). Here, for simplicity, only the two-dimensional variant is presented, as 
extensions to higher dimensions are straightforward. We have 


k = 


o 

o' 

^0,1 

^0,M2 


^0,7U2 

ko,i 

^1,0 

^1,1 

ki,M2 

0 

^1,M2 

■ ■ ■ ^1,1 





^Mi,M2 

• • • 


0 


0 


0 



• ^Mi,M2 

0 

^Mi,M2 

• • • 

kifl 

^1,1 

ki,M2 


ki,M2 

• • • ^1,1 


( 8 ) 
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and 


Cl,l 

Cl,2 • ■ 






0 

CMi,1 

CMi,2 • ■ 

,7Vf2 



0 


0 


If one prefers to make use of the effective support property, as in Mi and M 2 must 
be replaced by Li and L 2 , respectively. However, this is only a technical procedure which 
does not affect the problem under consideration. 

The sizes of the zero matrices are chosen so that after the reshaping of c and fc, they 
both have the same dimension Pi x P 2 , x,..., xP^ (highly composite integers; typically, a 
power of 2). Now, to get the searched density estimate / from (E]) we can apply the discrete 
convolution theorem, that is, we must do the following operations: 


C = F{c), K = F{k), S = CK, s = F-\S) (10) 


where F stands for the Fourier transform and F~^ is its inverse. The sought density 
estimate / corresponds to a subset of s in Eqn. flTU]) divided by the product of Fi, ^ 2 , • • •, 
(the so-called normalization), that is 


/ = 


-s[l : : Mrf] 


( 11 ) 


PlP2...Pd 

where, for the two-dimensional case, s[a : 6, c : d] means a subset of rows from a to b and 
a subset of columns from c to d of the matrix s. 

Now we will try to discover what is wrong with k anc_ 
described in Section [21 Wand’s algorithm presented in 


c matrices, causing the problems 

Wand and Jones 

( 

1995 

, appendix 

. In 

Wand 

(1994 

) the algorithm is 


generalized for higher dimensions. However, this generalization supports only constrained 
(diagonal) bardwidth matrices H. If we carefully look at (IH]) it is easily to recognize that 
the wrap-around ordering used will support only kernels in orientations according to the 
coordinate axes, that is those where H is diagonal. If H is unconstrained many entries in 
ki of (|5]) required to compute fj does not occur in ([8]). In other words entries for ‘negative 
times’ (for example /c_i ,2 or /i:_i _i) will not be recovered by the wrap-around ordering as 
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k-i ^2 7 ^ ki ^2 and /c-i-i 7 ^ /ci,i and so on. The above pairs of entries would be equal only if 
H were diagonal. This implicitly explains why Figure (U^b) is so similar to Figure [T](c). 


4.2 The corrected algorithm 


Regarding the problems described in the previous subsection, we propose a different re¬ 
shaping for k and c (now renamed to knew and Cnew) which removes the problem presented 
in the paper. Note that now wrap-around ordering is not utilized, only zero-padding is 
used. So, we have 


k — 

^new 




Ml,—M 2 


k_ 


kn_- 


0 ,—A^2 


^Mi,—M2 


Mi,0 


^0,0 


k_ 


Ml,M 2 


^0,A^2 


and 


Crtp'in — 


0 


0 


0 


kuifi 

0 

0 


k 


Ml,M 2 


( 12 ) 


Cl.l 


Cl,M 2 


0 


0 


CM\,1 * * * 


0 


0 


(13) 


where the entry ci^ in Eqn. flT^ is placed in row Mi and column M 2 . The sought density 
estimate / corresponds to a subset of s in Eqn. (ITOD divided by the product of Pi, P 2 , • • • 1 -Pd 
(the so-called normalization), that is 
1 


/ = 


*[(2Mi - 1) : (3Mi - 2),..., (2Mrf - 1) : (3Mrf - 2)]. 


(14) 


Pi P2...Pd 

As was mentioned in Section O Mk values can be shrunk to some smaller values L^- 
We propose to calculate using the following formula (/c = 1, • • • , d) 


Lfc = min M^ — 1 , ceiling 


T 


(15) 








where A is the largest eigenvalue of H and 5k is the mesh size from Eqn. (II]). After some 
empirical tests we have found that r can be set to around 3.7 for a standard two-dimensional 
normal kernel. 

After implementing the improved version of Wand’s algorithm (based on flT^ and (lT3|) i 
and calculating density estimation analogous to that depicted in Figure [I](b) we can easily 
conclude that now the plot is identical to the one from Figure [11(a). This implies that 
the weakness of the original algorithm being the main paper’s subject was resolved. Two 
dedicated R functions (bkde. 2D .no. fft .radial and bkde. 2D. f ft. radial. corrected) 
are included as supplemental materials for the purpose of replication of Figure HJ The 
latter implements the corrected Wand’s algorithm. 

5 Conclusion 

In the paper we have described a very serious problem of using FFT for calculation of 
multivariate kernel estimators when unconstrained bandwidth matrices are used. Next, 
we have discovered a satisfactory solution which rectihes the problem. As a consequence, 
the results given by direct evaluation of ([5]) or by ([7]) and by the proposed FFT-based 
algorithm based on flT2l) and flT3|) are identical for any form of the H bandwidth matrix. 
Our results can be used not only for direct KDE calculations, but also for calculation of a 
class of functionals which are very important for example in optimal bandwidth selection 
for KDE. Our results have been already implemented in the ks R package, starting from 
version 1.10.0. 
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